-
Notifications
You must be signed in to change notification settings - Fork 399
DEBUG-3558 DI: chunk snapshot payloads #5086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for updating Change log entry section 👏 Visited at: 2025-11-24 18:05:19 UTC |
BenchmarksBenchmark execution time: 2025-11-28 06:12:52 Comparing candidate commit 228785a in PR branch Found 1 performance improvements and 0 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics. scenario:profiling - Allocations ()
|
Typing analysisNote: Ignored files are excluded from the next sections. Untyped methodsThis PR introduces 4 untyped methods, and clears 3 untyped methods. It decreases the percentage of typed methods from 54.94% to 54.93% (-0.01%). Untyped methods (+4-3)❌ Introduced:If you believe a method or an attribute is rightfully untyped or partially typed, you can add |
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage 🔗 Commit SHA: 228785a | Docs | Datadog PR Page | Was this helpful? Give us feedback! |
lib/datadog/di/transport/input.rb
Outdated
| # The maximum chunk size that intake permits is 10 MB. | ||
| # | ||
| # Two bytes are for the [ and ] of JSON array syntax. | ||
| MAX_CHUNK_SIZE = 10 * 1024 * 1024 - 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add the size resolution here via MAX_CHUNK_SIZE_BYTES?
lib/datadog/di/transport/input.rb
Outdated
| if chunk.length == 1 && chunk.first.length > MAX_CHUNK_SIZE | ||
| # Drop the chunk. | ||
| # TODO report via telemetry metric? | ||
| logger.debug { "di: dropping too big snapshot" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but could it be more than 1 chunk going beyond max size? Or what is this scenario we are special handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior changed a bit after I adjusted the code for the correct limits.
There are two limits that are relevant: size of any one snapshot - 1 MB and the size of the batch - 5 MB.
Given these limits, the batch (with chunking) will never exceed its max size since it is always legal to send a batch of one snapshot and the snapshot is limited to 1 MB.
For individual snapshots, yes, multiple snapshots could exceed their size, and each one will be logged. The logging however is at debug level which means the customers won't see any of it (normally). If this logging becomes an issue in the future I can add some sort of throttling but for now I think it's not necessary to worry about too much log output.
Co-authored-by: Sergey Fedorov <[email protected]>
|
I got the limits wrong in the initial implementation of this PR, this has now been corrected. |
* master: Transports: DRY HTTP client code (#5095) DI: extract instance_double_agent_settings_with_stubs to DRY tests (#5087) [PROF-13115] Fix profiler ractor specs failing on Ruby 4 [PROF-13115] Bootstrap installing dependencies on Ruby 4.0.0-preview2 Clarify support for `rb_obj_info` and why it's OK to not have it Tweak pending to not apply to all Ruby preview versions Do not try to use `rb_obj_info` on Ruby 4.0 Adjust stack collector spec to account for changed Ruby 4 behavior [PROF-13115] Disable heap profiling on Ruby 4 preview due to incompatibility Stub sampling in integration tests Rewrite security response tests Bump the gh-actions-packages group across 3 directories with 13 updates
Strech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a tiny suggestion over performance of the filter_map
| array.map(&block).reject do |item| | ||
| item.nil? | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done in a single iteration with each_with_object (or we can shadow the item and reduce variables)
| array.map(&block).reject do |item| | |
| item.nil? | |
| end | |
| array.each_with_object([]) do |item, memo| | |
| new_item = block.call(item) | |
| memo.push(new_item) unless new_item.nil? | |
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@p-datadog I think you have reverted suggestion, but if you want to cover lazy evaluation, you can add a simple check and keep the single iteration/single allocation branch
elsif array.is_a?(Enumerator::Lazy)
# your example for lazy enumerator
else
# my suggestion each_with_object(...)
endCo-authored-by: Sergey Fedorov <[email protected]>
This reverts commit ca985cf.
Strech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left the suggestion about the filter_map path when it's not lazy enumerator
What does this PR do?
Implements chunking of dynamic instrumentation snapshot payloads to stay under the 10 MB intake limit.
Motivation:
Without chunking, if a probe (or multiple probes) produce many events, all of the events will be dropped by the intake.
Change log entry
Yes: dynamic instrumentation: fix sending large quantities of snapshots
Additional Notes:
There is also "snapshot pruning" which is supposed to be done to reduce the size of individual snapshot events below the 1 MB limit for a "log line". This is not in scope for this PR (or Ruby Live Debugger GA).
This PR ensures that if many events are generated by Live Debugger, they all get sent to the backend, but does not help with huge individual snapshots getting dropped.
This PR does however add diagnostics when snapshots are dropped due to excessive size.
How to test the change?
Unit tests added